Ecological Informatics
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Ecological Informatics's content profile, based on 29 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Werber, Y.
Show abstract
Radar aeroecology is dedicated to making ecological inference about aerial wildlife from radar-derived information. While producing unique, large-scale datasets describing biological activity in the sky, radar methodologies are largely incapable of relating these to specific species and are thus taxonomically limited. I describe a computational method to increase taxonomic resolution in vertical looking radar data by dividing detected organisms into morphology and movement-based aerial morphotypes. Using the Birdscan MR1 radar target classifier, wing flapping frequency calculation and target size estimation, I demonstrate a nearly 8 fold increase in classification resolution of bird radar data from the Hula Valley Research station, Israel. Furthermore, by relating each species in the regions species pool to its relevant morphotype, I show that most of these newly separated classes are related to small numbers of species (1-10), providing realistic opurtunities to bridge the taxonomy gap in radar data. By using the morphotype approach, radar aeroecologists can start observing and discussing the concept of "Aerodiversity", analogues to widely used biodiversity, a fundamental measure in ecology and conservation sciences. By analitically adressing taxonomy in radar-aeroecology, practitioners will increase the impact and dissemintation of their work and contribute to a better, more complete understanding of the aerial habitat.
Malerba, M. E.; Perez-Granados, C.; Bell, K.; Palacios, M. M.; Bellisario, K. M.; Desjonqueres, C.; Marquez-Rodriguez, A.; Mendoza, I.; Meyer, C. F. J.; Ramesh, V.; Raick, X.; Rhinehart, T. A.; Wood, C. M.; Ziegenhorn, M. A.; Buscaino, G.; Campos-Cerqueira, M.; Duarte, M. H. L.; Gasc, A.; Hanf-Dressler, T.; Juanes, F.; do Nascimento, L. A.; Rountree, R. A.; Thomisch, K.; Toledo, L. F.; Toka, M.; Vieira, M.
Show abstract
Passive acoustic monitoring (PAM) enables non-invasive sampling of wildlife across broad spatial, temporal and taxonomic scales. Its ongoing and widespread use has generated unprecedented volumes of acoustic data, shifting the primary bottleneck from data collection to the storage, processing, integration, and interpretation of PAM outputs. Although many software tools exist to address these challenges, differences in their design, scope, and usability often create fragmented and complex analytical workflows. To identify the key barriers and opportunities shaping the implementation of PAM surveys, we conducted a structured expert solicitation involving 30 international practitioners working across terrestrial and aquatic ecosystems. Experts identified and ranked their most critical pain points in current PAM workflows, spanning data storage, processing, and interpretation. The top challenge identified related to accurate species identification using deep learning and artificial intelligence (AI) models, especially in noisy soundscapes or for underrepresented taxa. Eight additional priority challenges included workflow fragmentation, limited availability of user-friendly analytical and visualisation tools, uneven access to software, manual validation bottlenecks, computational constraints, and difficulties in data handling, standardisation, and sharing. Participants also proposed practical mitigation strategies for these priority challenges, supported by step-by-step guidance to help overcome key barriers. Together, these insights provide a roadmap toward more scalable, open-access, and collaborative software systems, which are increasingly essential to realise the full potential of PAM in global biodiversity monitoring.
Croasdale, E. M.; Saponari, L.; Dale, C.; Shah, N.; Williams, B.; Lamont, T. A. C.
Show abstract
Coral restoration is recognised as a critical tool to mitigate pantropical degradation of reef ecosystems. Robust monitoring of restoration progress is crucial for projects to evaluate their success, improve practice, and share knowledge. However, traditional visual surveys often fail to capture the full impact of coral restoration on reef function. Therefore, we employed Passive Acoustic Monitoring (PAM) to assess whether the soundscape of a coral restoration site in the Seychelles differs from adjacent healthy and degraded reference reefs. We applied two methods of soundscape analysis: manual detection of unidentified fish sounds; and machine learning-based Uniform Manifold Approximation and Projection analysis. Results were approach-specific: the manual approach highlighted similarities in fish calls between the restoration site and the healthy reference reef, while the machine learning approach extracted broader soundscape patterns, clustering the restoration site alongside the degraded reference reef. Although this is a single-site study, these findings suggest that a) coral restoration alters reef soundscapes, though recovery time may be taxon-specific, and b) multiple metrics are needed to bridge single-taxon and broad soundscape scales. This study contributes to the evolving field of soundscape ecology in coral reef ecosystems, highlighting the utility of PAM in monitoring changes to reef function through coral restoration.
Jiang, X.; Zhang, Y.; Shu, Z.; Xiao, Z.; Wang, D.
Show abstract
Passive acoustic monitoring (PAM) is increasingly applied in biodiversity research, yet its reliability as a proxy for biodiversity remains insufficiently evaluated. In particular, the spatiotemporal autocorrelation inherent in acoustic indices of PAM is rarely quantified, despite its importance for the standardized application of acoustic monitoring. We conducted an integrated study to investigate these issues using a complete grid-based monitoring system covering the entire region (100 grids of 1 km x 1 km) in southern subtropical climatic zones. Acoustic data from 58 valid sites were combined with camera-trapping and vegetation surveys to evaluate six commonly used acoustic indices in PAM. We found that these indices were more strongly associated with relative abundance and community diversity metrics of bird and mammal than with species richness. Spatially, autocorrelation ranges of some acoustic indices extended to approximately 4 km (i.e., the Bioacoustic Index (BIO) and Normalized difference soundscape index (NDSI)). Temporally, all indices exhibited significant autocorrelation over 2-5 days, exceeding the typical short-term turnover of bird and mammal activity (1-2 days). Our results indicate that acoustic indices are not direct proxies for species richness but provide complementary information on soundscape dynamics. By explicitly quantifying spatiotemporal autocorrelation, this study offers practical guidance for sampling design and statistical analysis in passive acoustic monitoring, supporting more reliable and efficient biodiversity assessment.
Stowell, D.; Nolasco, I.; McEwen, B.; Vidana Vila, E.; Jean-Labadye, L.; Benhamadi, Y.; Lostanlen, V.; Dubus, G.; Hoffman, B.; Linhart, P.; Morandi, I.; Cazau, D.; White, E.; White, P.; Miller, B.; Nguyen Hong Duc, P.; Schall, E.; Parcerisas, C.; Gros-Martial, A.; Moummad, I.
Show abstract
Computational bioacoustics has seen significant advances in recent decades. However, the rate of insights from automated analysis of bioacoustic audio lags behind our rate of collecting the data - due to key capacity constraints in data annotation and bioacoustic algorithm development. Gaps in analysis methodology persist: not because they are intractable, but because of resource limitations in the bioacoustics community. To bridge these gaps, we advocate the open science method of data challenges, structured as public contests. We conducted a bioacoustics data challenge named BioDCASE, within the format of an existing event (DCASE). In this work we report on the procedures needed to select and then conduct useful bioacoustics data challenges. We consider aspects of task design such as dataset curation, annotation, and evaluation metrics. We report the three tasks included in BioDCASE 2025 and the resulting progress made. Based on this we make recommendations for open community initiatives in computational bioacoustics.
Uiterwaal, S. F.; La Sorte, F. A.; Coblentz, K. E.; DeLong, J. P.
Show abstract
MotivationThe diet composition of a predator is a direct reflection of its role in a food web, resulting from interactions with prey species. Raptors (including hawks, owls, and falcons) are ubiquitous predators with diverse diets, yet there is no comprehensive database of raptor diet composition. We present a database of over 3500 raw raptor diet records, compiled from more than 1000 studies and representing 173 raptor species from across the world. Our dataset complements existing qualitative summaries of species diets by compiling thousands of quantitative diet "samples" over time and space to present diet data at a uniquely fine resolution. Main types of variable containedThe database comprises published records of raptor diets from pellets, prey remains, direct or photographic observations, prey DNA, and raptor gut or gullet contents. For each diet, we present the taxonomic identity and amounts of consumed prey. We additionally present various metadata for each diet such as location, habitat, and season. Spatial location and grainThe study incorporates diet records collected worldwide, with each record assigned geographic coordinates corresponding to the location where the diet information was obtained. Time period and grainThe database includes diet records from 1893 to 2025. We report a year for each diet record. Major taxa and level of measurementWe recorded raptor diet at the species level, including raptors from three orders: Strigiformes, Falconiformes and Accipitriformes excluding vultures. Most prey are identified to species, but prey taxonomic level varies depending on the extent to which they could be identified. Software formatDiet records and metadata are provided in two files with comma-separated value (.csv) format.
Ostojic, M.; Sethi, S.
Show abstract
With bird populations across the world being impacted by ever-growing anthropogenic pressures, reliable monitoring is essential to help halt or reverse declines. Existing visual bird monitoring approaches, which employ cameras or radars to deliver automated and large-scale monitoring data, face a variety of issues. Image-based species classification is only possible if the fine-scale features of a bird are clear, which can be difficult to achieve in real monitoring contexts without expensive, high-resolution cameras due to occlusion and lighting. Radar and video-based approaches which analyse longer-term flight behaviour over the course of seconds can achieve more reliable results in real monitoring contexts, particularly from greater distances, but still require expensive equipment and do not account for all the possible types of flight patterns. Here we present a novel approach to track a wide range of bird flight patterns using inexpensive equipment. As a proof-of-concept, we demonstrate how our approach can be used to classify birds between four species, Red Kite, Kestrel, Black-Headed Gull and Sparrowhawk, which represent four different types of flight patterns. The balanced accuracy of the classification is 0.5583, with a recall and precision per species that range from 0.2640-0.7750 and 0.4583-0.5962, respectively. Our proof-of-concept study demonstrates how new and existing visual bird monitoring systems can leverage flight patterns to deliver species-level insights at lower costs and on larger scales than before.
Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.
Show abstract
Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.
Slooten, E.; Myers, L. S.; Nabe-Nielsen, J.
Show abstract
We developed an agent-based model (ABM) to assess how area-based controls on fishing methods can reduce fishing mortality and population declines. The model incorporates the behavior and distributions of dolphins and fishing vessels, and realistic displacement of fishing effort when protection is extended. Our case study is New Zealand dolphin - Hectors and Maui dolphins. The model was designed and calibrated using pattern-oriented modeling. Our results show that mortality due to entanglement in fishing gears has been reduced thanks to a gradual increase in dolphin protection. However, current protection is not as effective as previously thought, and scarce populations are negatively affected by Allee effects. Neither national nor international goals for reducing bycatch are met by current dolphin protection. The IUCN has recommended banning gillnet and trawl fisheries in New Zealand waters < 100m deep. For most New Zealand dolphin populations, this would be effective in achieving national and international goals for reducing bycatch. Only two populations would require additional protection. This modelling approach is also suitable for assessing impacts of bycatch and ship strikes for other marine species, making it suitable for informing management decisions in many regions.
Ward, E. J.; Anderson, S. C.
Show abstract
Spatial and spatiotemporal models are increasingly critical for understanding species distributions, tracking population change, and informing conservation decisions. As biological processes are influenced by increasing external pressures, including human disturbance or environmental change, accurate model predictions become essential for adaptive management. However, the reliability of spatial predictions depends on often-overlooked modelling choices, including the spatial resolution used to approximate underlying processes. Using long term monitoring data from a large-scale groundfish survey in the California Current ecosystem, we investigated how spatial model complexity affects the quality of ecological predictions and derived indices used for management. We fit spatial and spatiotemporal models of ocean temperature and fish biomass density for 27 commercially important species using varying levels of spatial resolution. We evaluated both in-sample and out-of-sample prediction, and effects on area-weighted biomass indices. Counter to common assumptions, increasing spatial approximation resolution did not universally improve predictions. Our case studies demonstrate that for many datasets, out-of-sample prediction quality peaked at intermediate spatial resolutions and declined at the finest scales. Through simulation testing, we found this pattern was strongest when spatial patterning had a small range and high spatial variance, and observation error was low. For most species, spatial resolution had a minimal effect on biomass trend estimates used in management, but for several commercially important rockfish species, resolution choices substantially affected both the scale and uncertainty of population indices. Our findings demonstrate that spatial model specification can substantially affect ecological inference, with direct implications for management and conservation planning. We provide practical guidance for ecologists on selecting appropriate spatial complexity through cross-validation. When out-of-sample prediction is a focus, appropriate approximation complexity should improve both parameter estimation accuracy and derived quantities.
Ichinokawa, M.; Okamura, H.
Show abstract
The hockey-stick (HS) stock recruitment relationship (SRR) has been widely used as an empirical alternative to conventional SRRs such as the Beverton-Holt (BH) and Ricker (RI) models. However, the management performance and risks associated with estimating maximum-sustainable-yield (MSY) reference points (RPs) based on HS remain insufficiently understood. This study first defines deterministic and stochastic MSY RPs under the HS model and provides an overview of their properties. We then conduct simulation experiments to investigate the bias and management consequences that arise when MSY RPs are estimated from the HS model (HS-derived MSY RPs) rather than from the true SRR (e.g., BH) across a range of biological and stochastic parameters, with particular focus on scenarios with insufficient data contrast. Our results show that HS-derived MSY RPs tend to exhibit higher bias but lower variance than MSY RPs derived from the true SRR. Management strategy evaluation simulations further reveal that management procedures combining HS-derived MSY RPs with adaptive model learning and some precautionary measures gradually reduce this bias and achieve average spawning biomass and yield that are comparable to those obtained under management based on the true BH SRR. We also show that the management effectiveness of the precautionary measures depends on life-history traits and recruitment variability. These findings indicate that although HS-derived MSY RPs may be biased and require cautious use, combining them with appropriate precautionary measures allows management to remain robust while limiting variability and yield losses. This broadens the range of management options that are available for supporting sustainable fisheries management.
Wilbourn, E. K.; Curtis, D.; Kolla, H.; Rai, P.; Lane, P.; McGowen, J.; Lane, T. W.; Poorey, K.
Show abstract
For sustainable algal biomass cultivation, we need substantial improvement in annualized productivity by reducing the frequency of crop failure and improved growth in open raceway pond systems. In this study, high-performing strains were identified and optimized for biomass productivity. We utilized next-generation sequencing methods to quantify the ecological features of open raceway systems cultivated at in Arizona. We utilized data from several months of cultivation runs to construct a rich time-series of the ecology dynamics using amplicon sequencing and used custom anomaly detection, "PondSentry", for the early prediction of pond crashes. PondSentry uses tensor decomposition of higher-order joint moments to detect incipient anomalies in multivariate data and displays significant improvements from standard knowledge-based anomaly detection methods. The PondSentry strategy identifies signs of deteriorating pond health at an average of three days before an actual crash event, with rank order of the ecological features plausible for crop failures driven by organisms such as Amoeboaphelidium occidentale FD01. These findings are independently confirmed with PCR and microscopy studies at an Arizona cultivation site. PondSentrys time-series-based anomaly detection of crashes provides a suitable monitoring strategy for eukaryotic crash agents in unialgal culture. The early warnings can be used to time interventions or harvests to prevent biomass loss. The PondSentry strategy strengthens the role of data science and data-driven methods in algal cultivation and can increase the feasibility of algal-biomass based products.
van Moorsel, S. J.; Schmid, B.; Niederberger, M.; Huggel, J.; Scherer-Lorenzen, M.; Rascher, U.; Damm, A.; Schuman, M. C.
Show abstract
Field-based monitoring of tree species in forests is often sparse due to logistical constraints. Remote sensing enables repeated, spatially contiguous collection of reflectance data across large areas. Tree species classification accuracy using such data is variable, likely because most studies use observational datasets where species occurrence correlates with environmental variation. We used two sites of a tree biodiversity experiment in Germany (BIOTREE: Kaltenborn and Bechstedt), where different species have been planted with high replication under controlled diversity levels, to assess how well tree species could be classified using reflectance data from airborne imaging spectroscopy and different classification methods (linear discriminant analysis, LDA, and a non-linear support vector machine, SVM). Reflectance data for 589 wavelengths between 400-2400 nm were acquired at 1 m spatial resolution during peak growing season. Reflectance spectra showed large and significant variation between taxonomic classes, orders, and species, and weak, but still significant, interactions between classes or orders and diversity levels. Classification accuracy reached 100% in training datasets, 77%-83% for the four species in Kaltenborn prediction datasets, and 31%-49% for the 16 species in Bechstedt prediction datasets. LDA provided more accurate predictions than SVM; and using similarly-spaced original wavelengths with LDA was as efficient as using principal components derived from the original data. While airborne imaging spectroscopy effectively distinguished up to four tree species in our datasets, classification accuracy was lower in more species-rich plots. In these cases, the methodology may be more useful for functional diversity monitoring than for tree species classification.
Martemyanov, V.; Soukhovolsky, V.; Dubatolov, V.; Kovalev, A.; Tarasova, O.
Show abstract
Methods for estimating and modeling the long-term and short-term adult flight dynamics of the conifer silk moth Dendrolimus superans (Lepidoptera: Lasiocampidae) are examined. The analysis uses light trap adult catch data collected over 21 years, from 2005 to 2025. Three models of adult flight are considered: a flight-initiation model driven by weather factors, an autoregressive model of long-term catch dynamics, and a binary model of seasonal catch. For the flight-initiation model, we propose estimating the accumulated temperature sum ST from the date when the first derivative of the remote sensing vegetation index NDVI becomes positive until the date of the first adult capture of the season. ST is shown to be sufficiently stable across all years of observation, with flight each year beginning after this temperature sum is reached. The second model demonstrates that the long-term light trap catch time series is well described by a second-order autoregressive model AR(2), in which the catch of the current year depends on catches from the two preceding years. This long-term series is compared with a previously studied larval population density series of the Siberian silk moth; both are shown to be AR(2) series with similar coefficient values, which suggesting that adult catch data may serve as a proxy for absolute larval population density. In the third model, we describe the transition from absolute-scale seasonal catch dynamics (number of adults per day) to a binary scale (0, 1), where 0 denotes days on which no adults were attracted to the trap, and 1 denotes days on which at least one individual was captured. The seasonal absolute catch series is thereby transformed into a binary series of zeros and ones, and relationships between adjacent values in such a binary series are examined. A linear relationship between the absolute and binary seasonal dynamics series is demonstrated, making it possible to estimate absolute catches from binary catch values and to analyze seasonal flight in sparse pest populations. This potentially opens new avenues for understanding how outbreak populations function at chronically low density. Author summaryForest pests can cause catastrophic damage, yet predicting their outbreaks remains challenging. During periods of low population density, standard monitoring methods become labor-intensive and uninformative, while the transition to an outbreak often occurs unexpectedly. Using a 21-year dataset of adult Siberian silk moth (Dendrolimus superans) captures from light traps, we developed an approach combining three complementary models. First, we showed that moth flight begins upon reaching a specific temperature sum, with the starting point determined by NDVI vegetation index dynamics rather than a calendar date--making the forecast more ecologically relevant. Second, long-term adult population dynamics follow a second-order autoregressive model AR(2), matching the dynamics previously observed for larval populations. This establishes light trap data as a reliable proxy for absolute population density when ground surveys are impractical. Third, we introduced a method to analyze seasonal flight using binary data (presence/absence of moths per day), which we showed is linearly related to absolute abundance. This enables studying population dynamics during periods of extremely low density, when traditional methods fail. Our approach opens new possibilities for early warning systems to detect when a population risks transitioning from a latent state to an outbreak phase.
Santos, R.; Oliveira-Rodrigues, C.; Silva, I. M.; Valente, R.; Afonso, L.; Gil, A.; Vinagre, C.; Sambolino, A.; Fernandez, M.; Alves, F.; Sousa-Pinto, I.; Correia, A. M.
Show abstract
Vessel-whale collisions are a growing global concern and remain challenging to quantify. Therefore, the use of proxies, such as Close Encounters (CEs) that comprise Surprise Encounters (SEs) and Near-Miss Events (NMEs), has been proposed and widely employed to assess collision risk. To better understand this risk in the Eastern North Atlantic, where maritime traffic is intensive, this study aimed to redefine and quantify CEs, and to assess detectability-related variables that may affect CE identification. CEs were assessed using a cetacean occurrence dataset collected between 2012 and 2024 on board cargo ships and oceanographic vessels. CEs thresholds were redefined based on Time to Potential Collision (TPC), rather than distance alone (as described in literature), to allow a more dynamic, risk-based, and speed-sensitive approach. In total, 1226 sightings of whales (baleen, sperm, and beaked whales) were recorded, of which 37.4% were classified as SEs and 2.0% as NMEs. The sperm whale, Physeter macrocephalus, was the species most frequently involved in CEs (13.9% of all CEs), followed by the Cuviers beaked whale, Ziphius cavirostris (11.8%). A Generalized Additive Model was used to assess the influence of detectability-related variables (i.e., meteorological conditions, whale taxa, vessel characteristics, and Marine Mammals Observers (MMOs) experience) on TPC. Significantly lower TPC values were observed with beaked whales, cargo ships, poor visibility conditions, and less experienced MMOs. The results of this study provide an CEs assessment in this region and contribute to the ongoing efforts to standardize CE quantification, by using TPC as a metric. This work also highlights the importance of decreased speeds and the presence of experienced MMOs on board to increase detection probability and TPC, thereby potentially minimizing collision risk.
Koshkarov, A.; Tahiri, N.
Show abstract
Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.
Hayes, R. A.; Kern, A. D.; Ponisio, L. C.
Show abstract
Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI
Halpern, M.
Show abstract
BackgroundData extraction is the primary bottleneck in meta-analysis, consuming weeks of researcher time with single-extractor error rates of 17.7%. Existing LLM-based systems achieve only 26-36% accuracy on continuous outcomes, and no study has validated AI-extracted continuous data against multiple independent datasets using formal equivalence testing. MethodsA single AI agent (Claude Opus 4.6) extracted treatment means, control means, sample sizes, and variance measures from source PDFs across five published agricultural meta-analyses spanning zinc biofortification, biostimulant efficacy, biochar amendments, predator biocontrol, and elevated CO2 effects on plant mineral nutrition. Observations were matched to reference standards using an LLM-driven alignment method. Validation employed proportional TOST equivalence testing, ICC(3,1), Bland-Altman analysis, and source-type stratification. ResultsAcross five datasets, the agent produced 1,149 matched observations from 136 papers. Pearson correlations ranged from 0.984 to 0.999. Proportional TOST confirmed statistical equivalence for all five datasets (all p < 0.05). Table-sourced observations achieved 5.5x lower median error than figure-sourced observations. Aggregate effects were reproduced within 0.01-1.61 pp of published values. Independent duplicate runs confirmed extraction stability (within 0.09-0.23 pp). ConclusionsA single AI agent achieves statistical equivalence with human-extracted meta-analysis data across five independent agricultural datasets. The approach reduces extraction cost by approximately one to two orders of magnitude while maintaining accuracy sufficient for aggregate meta-analytic pooling. HighlightsO_ST_ABSWhat is already knownC_ST_ABSO_LIData extraction is the primary bottleneck in meta-analysis, with single-extractor error rates of 17.7% C_LIO_LIExisting LLM-based extraction systems achieve only 26-36% accuracy on continuous outcomes C_LIO_LINo study has validated AI extraction against multiple independent datasets using formal equivalence testing C_LI What is newO_LIA single AI agent achieves statistical equivalence with human-extracted data across five agricultural meta-analyses (1,149 observations, 136 papers) C_LIO_LILLM-driven alignment resolves the previously underappreciated bottleneck of moderator matching, improving correlations from 0.377-0.812 to 0.984-0.997 without changing extracted values C_LIO_LITable-sourced observations achieve 5.5x lower error than figure-sourced data C_LI Potential impact for RSM readersO_LIProvides a validated, reproducible workflow for AI-assisted data extraction in meta-analysis C_LIO_LIDemonstrates that most apparent "extraction error" in validation studies is actually alignment error C_LIO_LIOffers practical quality signals (source-type labeling) for downstream meta-analysts C_LI
Hyman, A. C.; Collins, A.; Ramsay, C.; Allen, M. S.; Wilms, S.; Barbieri, L.; Frazer, T. K.
Show abstract
Accurate estimation of post-release survival is fundamental to fisheries stock assessment and effective management. Conventional tag-return studies and acoustic telemetry are commonly used to estimate this probability, yet each approach has limitations when applied independently. Using gag (Mycteroperca microlepis) as a case study, we integrated data from a large-scale conventional tagging program and an acoustic telemetry experiment within a discrete-time statistical modeling framework that links relative recapture risk with telemetry-derived fate. This approach enabled estimation of post-release survival across a broad gradient of capture depths representative of recreational fishing conditions. Estimated survival was high in shallow waters ({approx}97%) but declined with increasing capture depth, consistent with depth-related barotrauma. Applying model predictions to depth distributions from the recreational fishery yielded annual and monthly post-release survival probabilities. Annual estimates were consistent with values assumed in recent stock assessments, while monthly values highlighted seasonal patterns potentially relevant for management. This integrated framework advances post-release survival estimation by combining the extensive sample sizes and environmental coverage characteristic of conventional tagging data with the direct fate observations provided by acoustic telemetry, and offers a transferable approach for other highly targeted fisheries.
Freedman, M.
Show abstract
Community science data are increasingly recognized as important resources for biodiversity research, in part because of the spatial and temporal resolution that they afford. While these data are useful for applications such as describing occurrence patterns, tracking movement of migratory animals, and recording phenological events, they can also be probed for "second-order" purposes, such as documenting species interactions. Here, I present a dataset of more than 35,000 annotated interactions between monarch butterfly (Danaus plexippus) larvae and their associated host plants from the community science platform iNaturalist. I document more than 70 unique species of milkweed hosts (Apocynaceae: Asclepiadoideae) used by monarch larvae, including a number of previously undocumented interactions. Monarchs show strong seasonal turnover in the species of host plants used across the migratory cycle, highlighting the importance of early season hosts like Asclepias viridis and A. asperula in eastern North America and A. californica and A. cordifolia in the west. I also demonstrate that non-native horticultural milkweed species have increased the spatial extent of monarch breeding during winter (November - February) by more than 60%, a pattern previously suggested from observational data but not formally quantified until now. To my knowledge, this represents the largest analysis to date of species interactions using unstructured community science data and highlights the value of platforms like iNaturalist for conducting fundamental research in ecology and conservation.